Pesquisa | Portal Regional da BVS

1.

MetaTron: advancing biomedical annotation empowering relation annotation and collaboration.

Irrera, Ornella; Marchesin, Stefano; Silvello, Gianmaria.

BMC Bioinformatics ; 25(1): 112, 2024 Mar 14.

Artigo em Inglês | MEDLINE | ID: mdl-38486137

RESUMO

BACKGROUND: The constant growth of biomedical data is accompanied by the need for new methodologies to effectively and efficiently extract machine-readable knowledge for training and testing purposes. A crucial aspect in this regard is creating large, often manually or semi-manually, annotated corpora vital for developing effective and efficient methods for tasks like relation extraction, topic recognition, and entity linking. However, manual annotation is expensive and time-consuming especially if not assisted by interactive, intuitive, and collaborative computer-aided tools. To support healthcare experts in the annotation process and foster annotated corpora creation, we present MetaTron. MetaTron is an open-source and free-to-use web-based annotation tool to annotate biomedical data interactively and collaboratively; it supports both mention-level and document-level annotations also integrating automatic built-in predictions. Moreover, MetaTron enables relation annotation with the support of ontologies, functionalities often overlooked by off-the-shelf annotation tools. RESULTS: We conducted a qualitative analysis to compare MetaTron with a set of manual annotation tools including TeamTat, INCEpTION, LightTag, MedTAG, and brat, on three sets of criteria: technical, data, and functional. A quantitative evaluation allowed us to assess MetaTron performances in terms of time and number of clicks to annotate a set of documents. The results indicated that MetaTron fulfills almost all the selected criteria and achieves the best performances. CONCLUSIONS: MetaTron stands out as one of the few annotation tools targeting the biomedical domain supporting the annotation of relations, and fully customizable with documents in several formats-PDF included, as well as abstracts retrieved from PubMed, Semantic Scholar, and OpenAIRE. To meet any user need, we released MetaTron both as an online instance and as a Docker image locally deployable.

Assuntos

Poder Psicológico , Semântica , PubMed

2.

Modelling digital health data: The ExaMode ontology for computational pathology.

Menotti, Laura; Silvello, Gianmaria; Atzori, Manfredo; Boytcheva, Svetla; Ciompi, Francesco; Di Nunzio, Giorgio Maria; Fraggetta, Filippo; Giachelle, Fabio; Irrera, Ornella; Marchesin, Stefano; Marini, Niccolò; Müller, Henning; Primov, Todor.

J Pathol Inform ; 14: 100332, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-37705689

RESUMO

Computational pathology can significantly benefit from ontologies to standardize the employed nomenclature and help with knowledge extraction processes for high-quality annotated image datasets. The end goal is to reach a shared model for digital pathology to overcome data variability and integration problems. Indeed, data annotation in such a specific domain is still an unsolved challenge and datasets cannot be steadily reused in diverse contexts due to heterogeneity issues of the adopted labels, multilingualism, and different clinical practices. Material and methods: This paper presents the ExaMode ontology, modeling the histopathology process by considering 3 key cancer diseases (colon, cervical, and lung tumors) and celiac disease. The ExaMode ontology has been designed bottom-up in an iterative fashion with continuous feedback and validation from pathologists and clinicians. The ontology is organized into 5 semantic areas that defines an ontological template to model any disease of interest in histopathology. Results: The ExaMode ontology is currently being used as a common semantic layer in: (i) an entity linking tool for the automatic annotation of medical records; (ii) a web-based collaborative annotation tool for histopathology text reports; and (iii) a software platform for building holistic solutions integrating multimodal histopathology data. Discussion: The ontology ExaMode is a key means to store data in a graph database according to the RDF data model. The creation of an RDF dataset can help develop more accurate algorithms for image analysis, especially in the field of digital pathology. This approach allows for seamless data integration and a unified query access point, from which we can extract relevant clinical insights about the considered diseases using SPARQL queries.

3.

Building a large gene expression-cancer knowledge base with limited human annotations.

Marchesin, Stefano; Menotti, Laura; Giachelle, Fabio; Silvello, Gianmaria; Alonso, Omar.

Database (Oxford) ; 20232023 09 27.

Artigo em Inglês | MEDLINE | ID: mdl-37768281

RESUMO

Cancer prevention is one of the most pressing challenges that public health needs to face. In this regard, data-driven research is central to assist medical solutions targeting cancer. To fully harness the power of data-driven research, it is imperative to have well-organized machine-readable facts into a knowledge base (KB). Motivated by this urgent need, we introduce the Collaborative Oriented Relation Extraction (CORE) system for building KBs with limited manual annotations. CORE is based on the combination of distant supervision and active learning paradigms and offers a seamless, transparent, modular architecture equipped for large-scale processing. We focus on precision medicine and build the largest KB on 'fine-grained' gene expression-cancer associations-a key to complement and validate experimental data for cancer research. We show the robustness of CORE and discuss the usefulness of the provided KB. Database URL https://zenodo.org/record/7577127.

Assuntos

Neoplasias , Humanos , Neoplasias/genética , Bases de Dados Factuais , Bases de Conhecimento , Medicina de Precisão , Expressão Gênica

4.

Empowering digital pathology applications through explainable knowledge extraction tools.

Marchesin, Stefano; Giachelle, Fabio; Marini, Niccolò; Atzori, Manfredo; Boytcheva, Svetla; Buttafuoco, Genziana; Ciompi, Francesco; Di Nunzio, Giorgio Maria; Fraggetta, Filippo; Irrera, Ornella; Müller, Henning; Primov, Todor; Vatrano, Simona; Silvello, Gianmaria.

J Pathol Inform ; 13: 100139, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36268087

RESUMO

Exa-scale volumes of medical data have been produced for decades. In most cases, the diagnosis is reported in free text, encoding medical knowledge that is still largely unexploited. In order to allow decoding medical knowledge included in reports, we propose an unsupervised knowledge extraction system combining a rule-based expert system with pre-trained Machine Learning (ML) models, namely the Semantic Knowledge Extractor Tool (SKET). Combining rule-based techniques and pre-trained ML models provides high accuracy results for knowledge extraction. This work demonstrates the viability of unsupervised Natural Language Processing (NLP) techniques to extract critical information from cancer reports, opening opportunities such as data mining for knowledge extraction purposes, precision medicine applications, structured report creation, and multimodal learning. SKET is a practical and unsupervised approach to extracting knowledge from pathology reports, which opens up unprecedented opportunities to exploit textual and multimodal medical information in clinical practice. We also propose SKET eXplained (SKET X), a web-based system providing visual explanations about the algorithmic decisions taken by SKET. SKET X is designed/developed to support pathologists and domain experts in understanding SKET predictions, possibly driving further improvements to the system.

5.

Unleashing the potential of digital pathology data by training computer-aided diagnosis models without human annotations.

Marini, Niccolò; Marchesin, Stefano; Otálora, Sebastian; Wodzinski, Marek; Caputo, Alessandro; van Rijthoven, Mart; Aswolinskiy, Witali; Bokhorst, John-Melle; Podareanu, Damian; Petters, Edyta; Boytcheva, Svetla; Buttafuoco, Genziana; Vatrano, Simona; Fraggetta, Filippo; van der Laak, Jeroen; Agosti, Maristella; Ciompi, Francesco; Silvello, Gianmaria; Muller, Henning; Atzori, Manfredo.

NPJ Digit Med ; 5(1): 102, 2022 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-35869179

RESUMO

The digitalization of clinical workflows and the increasing performance of deep learning algorithms are paving the way towards new methods for tackling cancer diagnosis. However, the availability of medical specialists to annotate digitized images and free-text diagnostic reports does not scale with the need for large datasets required to train robust computer-aided diagnosis methods that can target the high variability of clinical cases and data produced. This work proposes and evaluates an approach to eliminate the need for manual annotations to train computer-aided diagnosis tools in digital pathology. The approach includes two components, to automatically extract semantically meaningful concepts from diagnostic reports and use them as weak labels to train convolutional neural networks (CNNs) for histopathology diagnosis. The approach is trained (through 10-fold cross-validation) on 3'769 clinical images and reports, provided by two hospitals and tested on over 11'000 images from private and publicly available datasets. The CNN, trained with automatically generated labels, is compared with the same architecture trained with manual labels. Results show that combining text analysis and end-to-end deep neural networks allows building computer-aided diagnosis tools that reach solid performance (micro-accuracy = 0.908 at image-level) based only on existing clinical data without the need for manual annotations.

6.

TBGA: a large-scale Gene-Disease Association dataset for Biomedical Relation Extraction.

Marchesin, Stefano; Silvello, Gianmaria.

BMC Bioinformatics ; 23(1): 111, 2022 Mar 31.

Artigo em Inglês | MEDLINE | ID: mdl-35361129

RESUMO

BACKGROUND: Databases are fundamental to advance biomedical science. However, most of them are populated and updated with a great deal of human effort. Biomedical Relation Extraction (BioRE) aims to shift this burden to machines. Among its different applications, the discovery of Gene-Disease Associations (GDAs) is one of BioRE most relevant tasks. Nevertheless, few resources have been developed to train models for GDA extraction. Besides, these resources are all limited in size-preventing models from scaling effectively to large amounts of data. RESULTS: To overcome this limitation, we have exploited the DisGeNET database to build a large-scale, semi-automatically annotated dataset for GDA extraction. DisGeNET stores one of the largest available collections of genes and variants involved in human diseases. Relying on DisGeNET, we developed TBGA: a GDA extraction dataset generated from more than 700K publications that consists of over 200K instances and 100K gene-disease pairs. Each instance consists of the sentence from which the GDA was extracted, the corresponding GDA, and the information about the gene-disease pair. CONCLUSIONS: TBGA is amongst the largest datasets for GDA extraction. We have evaluated state-of-the-art models for GDA extraction on TBGA, showing that it is a challenging and well-suited dataset for the task. We made the dataset publicly available to foster the development of state-of-the-art BioRE models for GDA extraction.

Assuntos

Mineração de Dados , Projetos de Pesquisa , Bases de Dados Factuais , Humanos

7.

MedTAG: a portable and customizable annotation tool for biomedical documents.

Giachelle, Fabio; Irrera, Ornella; Silvello, Gianmaria.

BMC Med Inform Decis Mak ; 21(1): 352, 2021 12 18.

Artigo em Inglês | MEDLINE | ID: mdl-34922517

RESUMO

BACKGROUND: Semantic annotators and Natural Language Processing (NLP) methods for Named Entity Recognition and Linking (NER+L) require plenty of training and test data, especially in the biomedical domain. Despite the abundance of unstructured biomedical data, the lack of richly annotated biomedical datasets poses hindrances to the further development of NER+L algorithms for any effective secondary use. In addition, manual annotation of biomedical documents performed by physicians and experts is a costly and time-consuming task. To support, organize and speed up the annotation process, we introduce MedTAG, a collaborative biomedical annotation tool that is open-source, platform-independent, and free to use/distribute. RESULTS: We present the main features of MedTAG and how it has been employed in the histopathology domain by physicians and experts to annotate more than seven thousand clinical reports manually. We compare MedTAG with a set of well-established biomedical annotation tools, including BioQRator, ezTag, MyMiner, and tagtog, comparing their pros and cons with those of MedTag. We highlight that MedTAG is one of the very few open-source tools provided with an open license and a straightforward installation procedure supporting cross-platform use. CONCLUSIONS: MedTAG has been designed according to five requirements (i.e. available, distributable, installable, workable and schematic) defined in a recent extensive review of manual annotation tools. Moreover, MedTAG satisfies 20 over 22 criteria specified in the same study.

Assuntos

Nomes , Processamento de Linguagem Natural , Algoritmos , Humanos , Semântica

8.

Search, access, and explore life science nanopublications on the Web.

Giachelle, Fabio; Dosso, Dennis; Silvello, Gianmaria.

PeerJ Comput Sci ; 7: e335, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33816986

RESUMO

Nanopublications are Resource Description Framework (RDF) graphs encoding scientific facts extracted from the literature and enriched with provenance and attribution information. There are millions of nanopublications currently available on the Web, especially in the life science domain. Nanopublications are thought to facilitate the discovery, exploration, and re-use of scientific facts. Nevertheless, they are still not widely used by scientists outside specific circles; they are hard to find and rarely cited. We believe this is due to the lack of services to seek, find and understand nanopublications' content. To this end, we present the NanoWeb application to seamlessly search, access, explore, and re-use the nanopublications publicly available on the Web. For the time being, NanoWeb focuses on the life science domain where the vastest amount of nanopublications are available. It is a unified access point to the world of nanopublications enabling search over graph data, direct connections to evidence papers, and scientific curated databases, and visual and intuitive exploration of the relation network created by the encoded scientific facts.

9.

Data Citation: a Computational Challenge.

Davidson, Susan B; Buneman, Peter; Deutch, Daniel; Milo, Tova; Silvello, Gianmaria.

Proc ACM SIGACT SIGMOD SIGART Symp Princ Database Syst ; 2017: 1-4, 2017 May.

Artigo em Inglês | MEDLINE | ID: mdl-29051698

RESUMO

Data citation is an interesting computational challenge, whose solution draws on several well-studied problems in database theory: query answering using views, and provenance. We describe the problem, suggest an approach to its solution, and highlight several open research problems, both practical and theoretical.

10.

Automating data citation: the eagle-i experience.

Alawini, Abdussalam; Chen, Leshang; Davidson, Susan B; Da Silva, Natan Portilho; Silvello, Gianmaria.

Proc ACM/IEEE Joint Conf Digit Libr ; 20172017 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-29599662

RESUMO

Data citation is of growing concern for owners of curated databases, who wish to give credit to the contributors and curators responsible for portions of the dataset and enable the data retrieved by a query to be later examined. While several databases specify how data should be cited, they leave it to users to manually construct the citations and do not generate them automatically. We report our experiences in automating data citation for an RDF dataset called eagle-i, and discuss how to generalize this to a citation framework that can work across a variety of different types of databases (e.g. relational, XML, and RDF). We also describe how a database administrator would use this framework to automate citation for a particular dataset.

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA